Quality Over Quantity: the Counter-Intuitive GenAI Key

German Lancioni

Jun 28, 2024

3 MIN READ

It’s been almost two years since OpenAI launched ChatGPT, driving increased mainstream awareness of and access to Generative AI tools. In that time, new tools and solutions seem to be launching daily. There is also a growing trend of building bigger models that consume larger quantities of training data, often with mixed results ranging from hallucinations or categorically incorrect facts to the regurgitation of opinions as universal truth, proving the old adage that sometimes “less is more”.

Quality over Quantity

So, if using more data doesn’t translate into better results… what does? It comes down to another tried and true saying – “quality over quantity.”

At McAfee, we maniacally focus on data quality. A well-developed Generative AI model is nothing without high-quality, curated datasets to fuel them. When the quantity of data is prioritized over quality, the results are often disappointing.

How do we produce quality data? Using millions of worldwide sensors, our AI engineers and AI data specialists focus on clues that point to threats. But that’s just the first step. Our teams then curate the data to improve the quality and maximize data diversity, reducing sources of bias, cross-pollinating data sources, and enriching and standardizing samples, just to name a few of the dozens of operations conducted to ensure we’re building datasets of the highest and purest quality.

All of this translates into the most comprehensive and robust AI-based protection for our customers: more than 1.5M threat detections per week across malware, scams, phishing, smishing, and more than half a billion web categorizations to help ensure a safe digital journey while browsing the Internet.

Human/AI Partnership

As the capabilities of AI tools increase, so does the conversation around how technology removes humans from the equation. The reality is that humans are still an integral part of the process and key to any successful Generative AI strategy. AI is only as good as the data it’s trained on, and in McAfee’s case, the guidance provided by cybersecurity experts. Thus, Cybersecurity AI specialists curating data is crucial to the development of all of our AI systems as it mitigates potential sources of error, resulting in accurate and trusted AI solutions, and allowing us to scale and share human expertise to better protect millions of customers worldwide.

Tackling cyber threats is a tall order that comes with intrinsic challenges. For example, modern scams are more subtle and less obvious even to experts, and quite often it is just the implicit intent that sets it apart from genuine (non-scam) content. Being context-aware can help navigate this landscape to more effectively detect and stop threats before they reach customers. What is more, we believe transparency and education are paramount for building a safer digital world. This is why we also invest in building explainable AI that helps users understand why a threat has been flagged and provides clues they can use to identify future threats.

Only the Beginning

The GenAI journey has only just begun. There is still a lot of work to do and a lot to look forward to as this technology continues to evolve. While it’s easy, as developers, to get caught up in the excitement, it’s also important to identify and focus on an ultimate goal and the responsible and safe steps to get there. At McAfee, we pledge to protect our customers, and we believe in the synergistic interaction between AI and Human Threat Intelligence. Together, we can deliver a trusted, world-class AI protection experience.

Introducing McAfee+

Identity theft protection and privacy for your digital life

Stay Updated

German Lancioni is Chief AI Scientist for the CTO Office and leads multiple data science teams working on the next generation of AI & ML based threat protection. Holding more...

Quality Over Quantity: the Counter-Intuitive GenAI Key

Quality over Quantity

Human/AI Partnership

Only the Beginning

Introducing McAfee+

More from McAfee Labs

Learn to Identify and Avoid Malicious Browser Extensions

Astaroth: Banking Trojan Abusing GitHub for Resilience

Android Malware Promises Energy Subsidy to Steal Financial Data

Think Before You Click: EPI PDF’s Hidden Extras

Android Malware Targets Indian Banking Users to Steal Financial Info and Mine Crypto

Fake Android Money Transfer App Targeting Bengali-Speaking Users

Stolen with a Click: The Booming Business of PayPal Scams

New Android Malware Campaigns Evading Detection Using Cross-Platform Framework .NET MAUI

Bogus ‘DeepSeek’ AI Installers Are Infecting Devices with Malware, Research Finds

Fake Toll Road Scam Texts are Everywhere. These Cities are The Most Targeted.

The Dark Side of Clickbait: How Fake Video Links Deliver Malware

Rising Scams in India: Building Awareness and Prevention